This section will demonstrate how to use treeio to parse tree with associated data to a single object in R.

This lesson assumes a basic familiarity with R and data frames.

This lesson does not cover methods and software for generating phylogenetic trees, nor does it it cover interpreting phylogenies. Here’s a quick primer on how to read a phylogeny that you should definitely review prior to this lesson, but it is by no means extensive. Genome-wide sequencing allows for examination of the entire genome, and from this, many methods and software tools exist for comparative genomics using SNP- and gene-based phylogenetic analysis, either from unassembled sequencing reads, draft assemblies/contigs, or complete genome sequences. These methods are beyond the scope of this lesson.

The treeio Package

treeio is an R package that designed for phylogenetic tree data input and output. It is released as part of Bioconductor and ROpenSci projects.

  1. tree Bioconductor page: https://www.bioconductor.org/packages/treeio.
  2. treeio homepage: https://guangchuangyu.github.io/treeio.

Just like R packages from CRAN, you only need to install Bioconductor packages once (instructions here), then load them every time you start a new R session.

library(treeio)

Getting tree data from evolutionary analysis result

Most tree viewer software (including R packages) focus on Newick and NEXUS file formats, and other evolutionary analysis software might also contains supporting evidence and/or analysis findings within the file that can be further analyzed in R or interpreted in phylogenetic context to help identifying evolutionary patterns.

treeio supports several file formats, including:

and software output from:

The treeio package implement several parser functions.

Parser functions defined in treeio
Parser function Description
read.beast parsing output of BEAST
read.codeml parsing output of CodeML (rst and mlc files)
read.codeml_mlc parsing mlc file (output of CodeML)
read.hyphy parsing output of HYPHY
read.jplace parsing jplace file including output of EPA and pplacer
read.mrbayes parsing output of MrBayes
read.newick parsing newick string, with ability to parse node label as support values
read.nhx parsing NHX file including output of PHYLDOG and RevBayes
read.paml_rst parsing rst file (output of BaseML or CodeML)
read.phylip parsing phylip file (phylip alignment + newick string)
read.r8s parsing output of r8s
read.raxml parsing output of RAxML

After parsing, storage of the tree structure with associated data is made through a S4 class, treedata, defined in the treeio package. These parsed data are mapped to the tree branches and nodes inside treedata object, so that they can be efficiently used to visually annotate the tree using ggtree package.